- 
                Notifications
    You must be signed in to change notification settings 
- Fork 697
[aoti-et] Add cuda delegate runtime code #14827
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
| 🔗 Helpful Links🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/executorch/14827
 Note: Links to docs will display an error until the docs builds have been completed. ❗ 1 Active SEVsThere are 1 currently active SEVs. If your PR is affected, please view them below: ❌ 1 New Failure, 120 PendingAs of commit a9bb409 with merge base d8e07bd ( NEW FAILURE - The following job has failed:
 
 This comment was automatically generated by Dr. CI and updates every 15 minutes. | 
| @@ -0,0 +1,374 @@ | |||
| /* | |||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why not keep bulk of code under backend/aoti and keep only cuda specific runtime AOTI bits here? Rationale is code dedup across all aoti backends.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah good point
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Next PR
|  | ||
| } // extern "C" | ||
|  | ||
| // AOTI Delegate Handle structure | 
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit if this backend can't be instantiated directly then perhaps s/aoti/_aoti?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you say more?
        
          
                examples/cuda/scripts/export.py
              
                Outdated
          
        
      | exec_program = delegated_program.to_executorch() | ||
| save_pte_program(exec_program, args.model_name, args.output_dir) | ||
| if args.generate_etrecord: | ||
| exec_program.get_etrecord().save(f"{args.model_name}_cuda_etrecord.bin") | 
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can we do etdump on aoti runtime?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Probably. Still trying to figure out how to do etdump for aoti, will probably defer to @Gasoonjia
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we can definitely do something on the et side (e.g. making every delegate call as a blackbox), but need sometime to make it support it inside the delegate
f46a3a5    to
    6e58d47      
    Compare
  
    | extern "C" { | ||
|  | ||
| // Type definitions | ||
| using AOTITensorHandle = Tensor*; | 
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we can directly using Tensor*; in the other places we've removed the alias.
This pull request introduces comprehensive support for the CUDA backend in ExecuTorch, enabling model export, build, and runtime execution with CUDA acceleration. It adds new CMake build logic, implements the CUDA backend runtime, updates workflow automation for CUDA model testing, and improves type and error handling for CUDA-specific operations.
CUDA Backend Integration
CMakeLists.txt, including registration of theaoti_cudabackend and dependencies on common AOTI and CUDA-specific sources. (CMakeLists.txt, [1];backends/cuda/CMakeLists.txt, [2]CudaBackendruntime incuda_backend.cpp, handling dynamic loading of model containers, GPU tensor management, and execution flow for CUDA kernels. (backends/cuda/runtime/cuda_backend.cpp, backends/cuda/runtime/cuda_backend.cppR1-R383)Workflow and Testing Automation
.github/workflows/cuda.yml, .github/workflows/cuda.ymlR64-R87).ci/scripts/test_model.sh, [1] [2] [3]Type and Error Handling Improvements
INT64and updating error messages for unsupported dtypes. (backends/cuda/runtime/shims/utils.h, [1] [2] [3]backends/aoti/aoti_model_container.h, [1] [2]Miscellaneous
backends/aoti/CMakeLists.txt, backends/aoti/CMakeLists.txtL33-R35)examples/cuda/scripts/__init__.py, examples/cuda/scripts/init.pyR1-R7)